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1 Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operatin g 
S ystems 

Robert W. Wisniewski, Bryan Rosenburg 

November 2003 Proceedings of the 2003 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(250.19 KB) Additional Information: full citation , abstract 

Programming, understanding, and tuning the performance of large multiprocessor 
systems is challenging. Experts have difficulty achieving good utilization for applications 
on large machines. The task of implementing a scalable system such as an operating 
system or database on large machines is even more challenging. And the importance of 
achieving good performance on multiprocessor machines is increasing as the number of 
cores per chip increases and as the size of multiprocessors increases. Cruci ... 



Quartz: a tool for tuning parallel program performance 
Thomas E. Anderson, Edward D. Lazowska 

April 1990 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1990 ACM SIGMETRICS conference on Measurement and modeling of 
computer systems SIGMETRICS '90, volume 18 issue l 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: g pdfd .51 MB) 



Initial implementations of parallel programs typically yield disappointing performance. 
Tuning to improve performance is thus a significant part of the parallel programming 
process. The effort required to tune a parallel program, and the level of performance that 
eventually is achieved, both depend heavily on the quality of the instrumentation that is 
available to the programmer. This paper describes Quartz, a new tool for tuning parallel 
program performance on shared memory mult ... 

FAST: A lar g e scale expert system for application and system software performance 
tunin g 

A. E. Irgon, A. H. Dragoni, T. O. Huleatt 

May 1988 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1988 ACM SIGMETRICS conference on Measurement and modeling of 
computer systems SIGMETRICS '88, Volume 16 issue l 

Publisher: ACM Press 

Full text available: Qpdf(499.17 KB) Additional Information: full citation , references , citings , index terms 
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4 A methodology for tuning and verifying package simulation models j 
David C. Efron 

August 1975 Proceedings of the 3rd symposium on Simulation of computer systems 
Publisher: IEEE Press 

Full text available* S p df(790.87 KB) Additional Information: full citation , abstract , references , citing s, index 
1 terms 

The computer system simulation packages are generally regarded as being capable of 
producing viable performance projections quickly and cheaply relative to the time and 
cost of programming unique simulation models. Many users also recognize that simulation 
models cast in the prescribed molds of the packages may be subject to various errors. 
They will therefore consider all results as coarse indications of expected performance 
levels. In contrast, this paper demonstrates how the p ... 

5 Tunin g : tools and techniques | 
J. P. Buzen 

September 1976 ACM SIGMETRICS Performance Evaluation Review, Volume 5 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(635.86 KB) Additional Information: full citation , abstract 

Tuning is basically a two stage process: the first stage consists of detecting performance 
problems within a system, and the second stage consists of changing the system to 
correct these problems. Measurement tools such as hardware monitors, software monitors 
and accounting packages are typically used in the first stage, and tools such as 
optimizers, simulators and balancers are sometimes used in the second stage. 

6 Active harmony: towards automated performance tuning 
Cristian Japu§, I-Hsin Chung, Jeffrey K. Hollingsworth 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society Press 

Full text available: ^ pdf(659.48 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we present the Active Harmony automated runtime tuning system. We 
describe the interface used by programs to make applications tunable. We present the 
Library Specification Layer which helps program library developers expose multiple 
variations of the same API using different algorithms. The Library Specification Language 
helps to select the most appropriate program library to tune the overall performance. We 
also present the optimization algorithm used to adjust parameters in the ... 
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The para g on performance monitorin g environment 

B. Ries, R. Anderson, W. Auld, D. Breazeal, K. Callaghan, E. Richards, W. Smith 
December 1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing 

Publisher: ACM Press 

Full text available: ^g pdf(1.13 MB) Additional Information: full citation , references , citings , index terms 



8 Performance measurements for multithreaded pro g rams 
Minwen Ji, Edward W. Felten, Kai Li 

June 1998 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1998 ACM SIGMETRICS joint international conference on Measurement 
and modeling of computer systems SIGMETRICS '98/ PERFORMANCE '98, 

Volume 26 Issue 1 
Publisher: ACM Press 
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Full text available: QpdfH.37MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Multithreaded programming is an effective way to exploit concurrency, but it is difficult to 
debug and tune a highly threaded program. This paper describes a performance tool 
called Tmon for monitoring, analyzing and tuning the performance of multithreaded 
programs. The performance tool has two novel features: It uses "thread waiting time" as a 
measure and constructs thread waiting graphs to show thread dependencies and thus 
performance bottlenecks, and it identifies "semi-busy-waiting" points w ... 

Architectural support for performance tuning: a case study on the SPARCcenter 2000 | 
A. Singhal, A. J. Goldberg 

April 1994 ACM SIGARCH Computer Architecture News , Proceedings of the 21ST 

annual international symposium on Computer architecture ISCA '94, volume 

22 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available- pdf(1 37 MB) Additional Information: full citation , abstract , references , citings, index 
' ^ terms 

Latency hiding techniques such as multilevel cache hierarchies yield high performance 
when applications map well onto hierarchy implementations, but performance can suffer 
drastically when they do not. Identifying and reducing mismatches between an application 
and the memory hierarchy is difficult without insight into the actual behavior of the 
hardware implementation. We advocate the use of hardware event counters, as a cheap, 
effective and practical way to tune applications for a given hardwar ... 

10 Monitoring p ro gram behaviour on SUPRENUM | 
Markus Siegle, Richard Hofmann 

April 1992 ACM SIGARCH Computer Architecture News , Proceedings of the 19th 

annual international symposium on Computer architecture ISCA '92, volume 

20 Issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, index 
terms 



Full text available: 



It is often very difficult for programmers of parallel computers to understand how their 
parallel programs behave at execution time, because there is not enough insight into the 
interactions between concurrent activities in the parallel machine. Programmers do not 
only wish to obtain statistical information that can be supplied by profiling, for example. 
They need to have detailed knowledge about the functional behaviour of their programs. 
Considering performance aspects, they need timing i ... 

11 ChaosMON — application-specific monitorin g and display of performance information 
for parallel and distributed systems 
Carol Kilpatrick, Karsten Schwan 

December 1991 ACM SIGPLAN Notices , Proceedings of the 1991 ACM/ONR workshop 

on Parallel and distributed debugging PADD '91, Volume 26 issue 12 
Publisher: ACM Press 

Full text available: fi3 pdf(1.08 MB) Additional Information: full citation , references , citings , index terms 



12 Normalized performance indices for message passin g parallel programs 
Sekhar R. Sarukkai, Jerry Yan, Jacob K. Gotwals 

July 1994 Proceedings of the 8th international conference on Supercomputing 
Publisher: ACM Press 

Full text available: fS pdfn.16 MB) Additional Information: full citation , abstract , references , citings , index 
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Existing tools for locating performance bottlenecks of message passing parallel programs 
either provide visualizations or profiles of program executions only; they do not highlight 
the cause of poor program performance. From the perspective of the application, the 
location and cause of performance problems in terms of procedures, processors and data 
structures are all important. Identifying the cause of poor performance necessitates the 
need to expose how well the underlyin ... 

13 Adaptive QoS parameters approach to modeling Internet performance 
Shin-Jer Yang, Hung-Cheng Chou 

January 2003 International Journal of Network Management, Volume 13 issue l 
Publisher: John Wiley & Sons, Inc. 

Full text available: ^ |pdf(1 39.90 KB) Additional Information: full citation , abstract , references , index terms 

Due to the recent advances In Internet technologies and applications, the issue of Quality 
of Service (QoS) is more essential to Internet performance. In this paper, we address and 
discuss the influence factors and also finalize the QoS parameters for Internet 
performance. Then we present the simulation procedure for monitoring the performance 
evaluation and propose the algorithm for tuning the performance value. Based on 
simulation results and performance analysis, we can" tune and adjust possib ... 

14 A relational approach to monitoring complex systems 
Richard Snodgrass 

May 1988 ACM Transactions on Computer Systems (TOCS), volume 6 issue 2 
Publisher: ACM Press 

Full text available' t£] pdf(3.42 MB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

Monitoring is an essential part of many program development tools, arid plays a central 
role in debugging, optimization, status reporting, and reconfiguration. Traditional 
monitoring techniques are inadequate when monitoring complex systems such as 
multiprocessors or distributed systems. A new approach is described in which a historical 
database forms the conceptual basis for the information processed by the monitor. This 
approach permits advances in specifying the low-level data collection, ... 

15 Improving interactive performance using TIPME 
Yasuhiro Endo, Margo Seltzer 

June 2000 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
2000 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems SIGMETRICS 'OO, volume 28 issue l 
Publisher: ACM Press 

Full text available* fS pdf(1.05 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

On the vast majority of today's computers, the dominant form of computation is GUI- 
based user interaction. In such an environment, the user's perception is the final arbiter 
of performance. Human-factors research shows that a user's perception of performance is 
affected by unexpectedly long delays. However, most performance-tuning techniques 
currently rely on throughput-sensitive benchmarks. While these techniques improve the 
average performance of the system, they do littl ... 

Keywords: interactive performance, monitoring 



16 Adaptive self-tuning memory in DB2 

Adam J. Storm, Christian Garcia-Arellano, Sam S. Lightstone, Yixin Diao, M. Surendra 
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September 2006 Proceedings of the 32nd international conference on Very large data 
bases VLDB '06 

Publisher: VLDB Endowment 

Full text available: ^ pdf(792.72 KB) Additional Information: full citation , abstract , references , index terms 

DB2 for Linux, UNIX, and Windows Version 9.1 introduces the Self-Tuning Memory 
Manager (STMM), which provides adaptive self tuning of both database memory heaps 
and cumulative database memory allocation. This technology provides state-of-the-art 
memory tuning combining control theory, runtime simulation modeling, cost-benefit 
analysis, and operating system resource analysis. In particular, the nove use of cost- 
benefit analysis and control theory techniques makes STMM a breakthrough technology 
in ... 



17 A scalable cross-platform infrastructure for application performance tuning using 
hardware counters 

S. Browne, J. Dongarra, IM. Garner, K. London, P. Mucci 

November 2000 Proceedings of the 2000 ACM/IEEE conference on Supercomputing 
(CDROM) 

Publisher: IEEE Computer Society 

Full text available: ^ p C jf(2.82 MB) Additional Information: full citation , abstract , references , citings, index 
Publisher Site ^s 

The purpose of the PAPI project is to specify a standard API for accessing hardware 
performance counters available on most modern microprocessors. These counters exist as 
a small set of registers that count "events", which are occurrences of specific signals and 
states related to the processor's function. Monitoring these events facilitates correlation 
between the structure of source/object code and the efficiency of the mapping of that 
code to the underlying architecture. This ... 

18 Stardust: tracking activity in a distributed storage system 
Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-EI-Malek, Julio 
Lopez, Gregory R. Ganger 

June 2006 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 

joint international conference on Measurement and modeling of computer 
systems SIGMETRICS '06/Performance '06, Volume 34 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(895.31 KB) Additional Information: full citation , abstract , references , index terms 

Performance monitoring in most distributed systems provides minimal guidance for 
tuning, problem diagnosis, and decision making. Stardust is a monitoring infrastructure 
that replaces traditional performance counters with end-to-end traces of requests and 
allows for efficient querying of performance metrics. Such traces better inform key 
administrative performance challenges by enabling, for example, extraction of per- 
workload, per-resource demand information and per-workload latency graphs. This ... 

Keywords: Ursa Minor, end-to-end tracing, request causal chain 



19 IMPuLSE: integrated monitoring and profilin g for large-scale environments 
^ Patrick G. Bridges, Arthur B. Maccabe 

October 2004 Proceedings of the 7th workshop on Workshop on languages, 
compilers, and run-time support for scalable systems LCR '04 

Publisher: ACM Press 

Full text available: ^| pdf(99.07 KB) Additional Information: full citation , abstract , references 

A lack of efficient system software is an increasing impediment to deploying large-scale 
parallel and distributed systems. Systemically addressing operating system-induced 
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performance anomalies requires accurate, low-overhead, whole-system monitoring, 
something that is currently unavailable in large tightly-coupled systems. In this paper, we 
present the design of IMPuLSE— Integrated Monitoring and Profiling for Large-Scale 
Environments— a system we are developing to meet this need. IMPULSE'S i ... 

20 Session 3: Scalability and resource usage of an OLAP benchmark on clusters of PCs Q 

Michela Taufer, Thomas Strieker, Roger Weber 
v August 2002 Proceedings of the fourteenth annual ACM symposium on Parallel 
algorithms and architectures 

Publisher: ACM Press 

Full text available: ^ pdf(219.90 KB) Additional Information: full citation , abstract , references , index terms 

Designing clusters of PCs for distributed databases processing OLAP(On Line Analytical 
Processing) workloads in parallel with good scalability remains a particular challenge as 
we are lacking a deep understanding of the architectural issues around resource usage by 
standard DBMSs on distributed platforms.To address this problem, we present a novel 
performance monitoring framework for filtering and abstracting samples of performance 
data from low level counters into a high level performance pictu ... 

Keywords: cluster of PCs, distributed OLAP processing, parallel databases, performance 
analysis, workload characterization 
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